Generative Adversarial Networks a.k.a GANs, are popular generative neural networks. GANs have demonstrated their effectiveness in nearly every problem in computer vision. The GAN works by training a pair of networks, Generator and Discriminator, with competing loss terms. As an analogy, we can think of these networks as an art-forger and the other, an art-expert. In GAN literature the Generator is the art-forger and the Discriminator is the art-expert. The Generator is trained to produce fake images (forgeries) to deceive the art-expert (Discriminator). The Discriminator which receives both the real images and fake images tries to distinguish between them to identify the fake images. The Generator uses the feedback from the Discriminator to improve it generation. Both the models are trained simulataneously and are always in competition with each other. This competition between the Generator and Discriminator drives them to improve their respective models continuously. The model converges when the Generator produces fake images that are indistinguishable from the real images.
In this setup, the Generator does not have access to the real images whereas the Discriminator has access to both the real and the generated fake images.
Let us define Discriminator D that takes an image as input and produces a number (0/1) as output and a Generator G that takes random noise as input and outputs a fake image. In practice, G and D are trained alternately i.e., For a fixed generator G, the Discriminator D is trained to classify the training data as real (output a value close to 1) or fake(output a value close to 0). Subsequenty, we freeze the Discriminator and train the Generator G to produce an image (fake) that outputs a value close to 1 (real) when passed through the Discriminator D. Thus, if the Generator is perfectly trained then the Discriminator D will be maximally confused by the images generated by G and predict 0.5 for all the inputs.
It will be ideal to solve this assignemnet on a computer with a GPU. The Coursera platform does not support a GPU. You may want to explore Google Colab or Kaggle
Along with submitting the Python notebook, save the notebook along with its output after executing all the cells as a .html file and submit the html file as well.
In this assignment, we will implement a Generative Adversarial Network on MNIST data and generate images that resemble the digits from the MNIST dataset.
To implement a GAN, we basically require 5 components:
Let us implement each of the parts and train the overall model:
## import packages
import torch
import random
import numpy as np
import torch.nn as nn
import torchvision
import torchvision.transforms as transforms
import torch.optim as optim
from torch.utils.data import DataLoader
from torch.utils.data import sampler
import torchvision.datasets as dset
import os
import numpy.testing as npt
#from torchsummary import summary
import matplotlib.pyplot as plt
%matplotlib inline
plt.rcParams['image.interpolation'] = 'nearest'
plt.rcParams['image.cmap'] = 'gray'
## Checks for the availability of GPU
is_cuda = torch.cuda.is_available()
#is_cuda = False
if is_cuda:
print("working on gpu!")
else:
print("No gpu! only cpu ;)")
## The following random seeds are just for deterministic behaviour of the code and evaluation
##############################################################################
################### DO NOT MODIFY THE CODE BELOW #############################
##############################################################################
random.seed(0)
np.random.seed(0)
torch.manual_seed(0)
torch.cuda.manual_seed_all(0)
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False
os.environ['PYTHONHASHSEED'] = '0'
###############################################################################
In this step we work on preparing the data. We normalize the images to range [-1, +1]
import torchvision
import torchvision.transforms as transforms
import os
root = './data/'
if not os.path.isdir(root):
os.mkdir(root)
train_bs = 128
# Data transformation for the DataLoader - normalizes to between [-1,1]
transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize(mean=[0.5], std=[0.5])])
training_data = torchvision.datasets.MNIST(root, train=True, transform=transform,download=True)
train_loader = torch.utils.data.DataLoader(dataset=training_data, batch_size=train_bs, shuffle=True, drop_last=True)
Let us define a function which takes (batchsize, dimension) as input and returns a random noise of requested dimensions. This noise tensor will be the input to the generator.
def noise(bs, dim):
"""Generate random Gaussian noise vectors N(0,I), with mean 0 and variance 1.
Inputs:
- bs: integer giving the batch size of noise to generate.
- dim: integer giving the dimension of the Gaussain noise to generate.
Returns:
A PyTorch Tensor containing Gaussian noise with shape [bs, dim]
"""
out = (torch.randn((bs, dim)))
if is_cuda:
out = out.cuda()
return out
Define a Generator with the following architecture.
TanH (To scale the generated images to [-1,1], the same as real images)
LeakyRELU: https://pytorch.org/docs/stable/nn.html#leakyrelu
class Generator(nn.Module):
def __init__(self, noise_dim=100, out_size=784):
super(Generator, self).__init__()
'''
REST OF THE MODEL HERE
# define a fully connected layer (self.layer1) from noise_dim -> 256 neurons
# define a leaky relu layer(self.leaky_relu) with negative slope=0.2. We can reuse the same layer multiple times.
# define a fully connected layer (self.layer2) from 256 -> 512 neurons
# define a fully connected layer (self.layer3) from 512 -> 1024 neurons
# define a fully connected layer (self.layer4) from 1024 -> out_size neurons
# define a tanh activation function (self.tanh)
'''
# your code here
self.layer1 = nn.Linear(noise_dim, 256)
self.leaky_relu = nn.LeakyReLU(0.2, inplace=False)
self.layer2 = nn.Linear(256, 512)
self.layer3 = nn.Linear(512, 1024)
self.layer4 = nn.Linear(1024, out_size)
self.tanh = nn.Tanh()
def forward(self, x):
'''
Make a forward pass of the input through the generator. Leaky relu is used as the activation
function in all the intermediate layers. Tanh activation function is only used at the end (which
is after self.layer4)
Note that, generator takes an random noise as input and gives out fake "images". Hence, the Tensor
output after tanh activation function should be reshaped into the same size as the real images. i.e.,
[batch_size, n_channels, H, W] == (batch_size, 1,28,28). You may use the .view(.) function to acheive it.
'''
# your code here
batch_size = x.shape[0]
#print(batch_size)
x = self.layer1(x)
x = self.leaky_relu(x)
x = self.layer2(x)
x = self.leaky_relu(x)
x = self.layer3(x)
x = self.leaky_relu(x)
x = self.layer4(x)
x = self.tanh(x)
return x.view(batch_size,1,28,28)
# Initialize the Generator and move it to GPU (if is_cuda)
generator = Generator()
print(generator)
# If you have a system with a GPU, you may want to install torchsummary and display the network in more detail
# summary(generator,(100,), device='cpu')
# move to GPU
if is_cuda:
generator = generator.cuda()
# Test cases
# Note the testcases only tests for input and output dimensions and range of values.
# You may modify the architecture within those constraints
# noise_dim is always 100
# Input to generator is (B,noise_dim) where B is arbitray batch_size
# Output of the Generator is (B,1,28,28) where B is arbitray batch_size, 1 is the grayscale channel 28 is image size
# The Generator Output is between [-1,1] since we use tanh() activations.
# Input to Discriminator is (B,1,28,28), where B is arbitray batch_size, 1 is the grayscale channel 28 is image size
# output of the discriminator is Tensor of dimension (B,1) where B is arbitray batch_size
a = torch.ones(5,100)
if is_cuda:
a = a.cuda()
out = generator(a)
npt.assert_equal(out.shape, (5,1,28,28))
assert np.max(out.detach().cpu().numpy()) <= 1
assert np.min(out.detach().cpu().numpy()) >= -1
# Hidden test cases follow
Define a Discriminator with the following architecture.
## Similar to the Generator, we now define a Discriminator which takes in a vector and output a single scalar
## value.
class Discriminator(nn.Module):
def __init__(self, input_size=784):
super(Discriminator, self).__init__()
'''
REST OF THE MODEL HERE
# define a fully connected layer (self.layer1) from input_size -> 512 neurons
# define a leaky relu layer(self.leaky_relu) with negative slope=0.2. (we will reuse the same layer)
# define a fully connected layer (self.layer2) from 512 -> 256 neurons
# define a fully connected layer (self.layer3) from 256 -> 1 neurons
'''
# your code here
self.layer1 = nn.Linear(input_size, 512)
self.leaky_relu = nn.LeakyReLU(0.2, inplace=False)
self.layer2 = nn.Linear(512, 256)
self.layer3 = nn.Linear(256, 1)
def forward(self, x):
'''
The Discriminator takes a vectorized input of the real and generated fake images. Reshape the input
to match the Discriminator architecture.
Make a forward pass of the input through the Discriminator and return the scalar output of the
Discriminator.
'''
# your code here
#print(x)
flattened_x = x.view(x.size(0), -1)
#rint(flattened_x)
x = self.layer1(flattened_x)
x = self.leaky_relu(x)
x = self.layer2(x)
x = self.leaky_relu(x)
x = self.layer3(x)
return x
# Initialize the Discriminator and move it to GPU (if is_cuda)
discriminator = Discriminator()
print(discriminator)
# If you have a system with a GPU, you may want to install torchsummary and display the network in more detail
# summary(discriminator,(784,), device='cpu')
# move to GPU
if is_cuda:
discriminator = discriminator.cuda()
# Test cases
# Note the testcases only tests for input and output dimensions and range of values.
# You may modify the architecture within those constraints
# noise_dim is always 100
# Input to generator is (B,noise_dim) where B is arbitray batch_size
# Output of the Generator is (B,1,28,28) where B is arbitray batch_size, 1 is the grayscale channel 28 is image size
# The Generator Output is between [-1,1] since we use tanh() activations.
# Input to Discriminator is (B,1,28,28), where B is arbitray batch_size, 1 is the grayscale channel 28 is image size
# output of the discriminator is Tensor of dimension (B,1) where B is arbitray batch_size
a = torch.ones(5,1,28,28)
if is_cuda:
a = a.cuda()
out = discriminator(a)
npt.assert_equal(out.shape, (5,1))
# Hidden testcases follow
We will use the Binary cross entropy loss function to train the GAN. The loss function includes sigmoid activation followed by logistic loss. This allows us to distinguish between real and fake images.
Binary cross entropy loss with logits: https://pytorch.org/docs/stable/nn.html#bcewithlogitsloss
# Initialize the 'BCEWithLogitsLoss' object
bce_loss = nn.BCEWithLogitsLoss()
Let's define the objective function for the Discriminator. It takes as input the logits (outputs of the Discriminator) and the labels (real or fake). It uses the BCEWithLogitsLoss() to compute the loss in classification.
def DLoss(logits_real, logits_fake, targets_real, targets_fake):
'''
Returns the Binary Cross Entropy Loss between predictions and targets
Inputs:
logits_real: the outputs of the discriminator (before the sigmoid) for real images
logits_fake: the outputs of the discriminator (before the sigmoid) for fake images
targets_real: groundtruth labels for real images
targets_fake: groundtruth labels for fake images
'''
# Concatenate the logits_real and the logits_fake using torch.cat() to get 'logits'
# Concatenate the targets_real and the targets_fake using torch.cat() to get 'targets'
# estimate the loss using the BCEWithLogitsLoss object 'bce' with 'logits' and 'targets'
# your code here
logits = torch.cat((logits_real, logits_fake), 1)
targets = torch.cat((targets_real, targets_fake), 1)
loss = bce_loss(logits, targets)
return loss
# Hidden testcases follow
Let's define the objective function for the Generator. It takes as input the logits (outputs of the Discriminator) for the fake images it has generated and the labels (real). It uses the BCEWithLogitsLoss() to compute the loss in classification. The Generator expects the logits for the fake images it has generated to be close to 1 (real). If that is not the case, the Generatro corrects itself with the loss
def GLoss(logits_fake, targets_real):
'''
The aim of the Generator is to fool the Discriminator into "thinking" the generated images are real.
GLoss is the binary cross entropy loss between the outputs of the Discriminator with the
generated fake images 'logits_fake' and real targets 'targets_real'
Inputs:
logits_fake: Logits from the Discriminator for the fake images generated by the Generator
targets_real: groundtruth labels (close to 1) for the logits_fake
'''
# estimate the g_loss using the BCEWithLogitsLoss object 'bce' with 'logits_fake' and 'targets_real'
# your code here
g_loss = bce_loss(logits_fake, targets_real)
#print(g_loss)
return g_loss
# Hidden testcases follow
Optimizers for training the Generator and the Discriminator. The below setup generates good images with the architecture. Feel free to adjust the optimizer settings.
Adam optimizer: https://pytorch.org/docs/stable/optim.html#torch.optim.Adam
#The following settings generated realistic images. Feel free to adjust the settings.
epochs = 201
noise_dim = 100
LR = 0.0002
optimizer_G = torch.optim.Adam(generator.parameters(), lr=LR, betas=(0.5, 0.999))
optimizer_D = torch.optim.Adam(discriminator.parameters(), lr=LR, betas=(0.5, 0.999))
## Training loop
for epoch in range(epochs):
for i, (images, _) in enumerate(train_loader):
# We set targets_real and targets_fake to non-binary values(soft and noisy labels).
# This is a hack for stable training of GAN's.
# GAN hacks: https://github.com/soumith/ganhacks#6-use-soft-and-noisy-labels
targets_real = (torch.FloatTensor(images.size(0), 1).uniform_(0.8, 1.0))
targets_fake = (torch.FloatTensor(images.size(0), 1).uniform_(0.0, 0.2))
if is_cuda:
targets_real = targets_real.cuda()
targets_fake = targets_fake.cuda()
images = images.cuda()
## D-STEP:
## First, clear the gradients of the Discriminator optimizer.
## Estimate logits_real by passing images through the Discriminator
## Generate fake_images by passing random noise through the Generator. Also, .detach() the fake images
## as we don't compute the gradients of the Generator when optimizing Discriminator.
## fake_images = generator(noise(train_bs, noise_dim)).detach()
## Estimate logits_fake by passing the fake images through the Discriminator
## Compute the Discriminator loss by calling DLoss function.
## Compute the gradients by backpropagating through the computational graph.
## Update the Discriminator parameters.
optimizer_D.zero_grad()
logits_real = discriminator(images)
fake_images = generator(noise(train_bs, noise_dim)).detach()
logits_fake = discriminator(fake_images)
discriminator_loss = DLoss(logits_real, logits_fake, targets_real, targets_fake)
discriminator_loss.backward()
optimizer_D.step()
## G-STEP:
## clear the gradients of the Generator.
## Generate fake images by passing random noise through the Generator.
## Estimate logits_fake by passing the fake images through the Discriminator.
## compute the Generator loss by caling GLoss.
## compute the gradients by backpropagating through the computational graph.
## Update the Generator parameters.
# your code here
optimizer_G.zero_grad()
fake_images = generator(noise(train_bs, noise_dim))
logits_fake = discriminator(fake_images)
generator_loss = GLoss(logits_fake, targets_real)
generator_loss.backward()
optimizer_G.step()
print("Epoch: ", epoch)
print("D Loss: ", discriminator_loss.item())
print("G Loss: ", generator_loss.item())
if epoch % 2 == 0:
viz_batch = fake_images.data.cpu().numpy()
viz_batch = viz_batch[:100,:,:,:]
viz_batch = viz_batch.reshape(-1,28*28).squeeze()
viz_batch = viz_batch.reshape(10,10, 28,28).transpose(0,2,1,3).reshape(28*10,-1)
plt.figure(figsize = (8,8))
plt.axis('off')
plt.imshow(viz_batch, cmap='gray')
plt.show()